A dynamic over-sampling procedure based on sensitivity for multi-class problems

نویسندگان

  • Francisco Fernández-Navarro
  • César Hervás-Martínez
  • Pedro Antonio Gutiérrez
چکیده

Classification with imbalanced datasets supposes a new challenge for researches in the framework of machine learning. This problem appears when the number of patterns that represents one of the classes of the dataset (usually the concept of interest) is much lower than in the remaining classes. Thus, the learning model must be adapted to this situation, which is very common in real applications. In this paper, a dynamic over-sampling procedure is proposed for improving the classification of imbalanced datasets with more than two classes. This procedure is incorporated into a memetic algorithm (MA) that optimizes radial basis functions neural networks (RBFNNs). To handle class imbalance, the training data are resampled in two stages. In the first stage, an over-sampling procedure is applied to the minority class to balance in part the size of the classes. Then, the MA is run and the data are oversampled in different generations of the evolution, generating new patterns of the minimum sensitivity class (the class with the worst accuracy for the best RBFNN of the population). The methodology proposed is tested using 13 imbalanced benchmark classification datasets from well-known machine learning problems and one complex problem of microbial growth. It is compared to other neural network methods specifically designed for handling imbalanced data. These methods include different over-sampling procedures in the preprocessing stage, a threshold-moving method where the output threshold is moved toward inexpensive classes and ensembles approaches combining the models obtained with these techniques. The results show that our proposal is able to improve the sensitivity in the generalization set and obtains both a high accuracy level and a good classification level for

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Determination of relative agrarian technical efficiency by a dynamic over-sampling procedure guided by minimum sensitivity

In this paper, a dynamic over-sampling procedure is proposed to improve the classification of imbalanced datasets with more than two classes. This procedure is incorporated into a Hybrid algorithm (HA) that optimizes Multi Layer Perceptron Neural Networks (MLPs). To handle class imbalance, the training dataset is resampled in two stages. In the first stage, an over-sampling procedure is applied...

متن کامل

A multi-stage stochastic programming for condition-based maintenance with proportional hazards model

Condition-Based Maintenance (CBM) optimization using Proportional Hazards Model (PHM) is a kind of maintenance optimization problem in which inspections of a system relevant to its failure rate depending on the age and value of covariates are performed in time intervals. The general approach for constructing a CBM based on PHM for a system is to minimize a long run average cost per unit of time...

متن کامل

A class of multi-agent discrete hybrid non linearizable systems: Optimal controller design based on quasi-Newton algorithm for a class of sign-undefinite hessian cost functions

 In the present paper, a class of hybrid, nonlinear and non linearizable dynamic systems is considered. The noted dynamic system is generalized to a multi-agent configuration. The interaction of agents is presented based on graph theory and finally, an interaction tensor defines the multi-agent system in leader-follower consensus in order to design a desirable controller for the noted system. A...

متن کامل

Dynamic Cargo Trains Scheduling for Tackling Network Constraints and Costs Emanating from Tardiness and Earliness

This paper aims to develop a multi-objective model for scheduling cargo trains faced by the costs of tardiness and earliness, time limitations, queue priority and limited station lines. Based upon the Islamic Republic of Iran Railway Corporation (IRIRC) regulations, passenger trains enjoy priority over other trains for departure. Therefore, the timetable of cargo trains must be determined based...

متن کامل

A heuristic method for consumable resource allocation in multi-class dynamic PERT networks

This investigation presents a heuristic method for consumable resource allocation problem in multi-class dynamic Project Evaluation and Review Technique (PERT) networks, where new projects from different classes (types) arrive to system according to independent Poisson processes with different arrival rates. Each activity of any project is operated at a devoted service station located in a n...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pattern Recognition

دوره 44  شماره 

صفحات  -

تاریخ انتشار 2011